| Element | Brief Description |
|---|---|
| Code name | foood |
| Project title | Exploring the Relationship Between COVID-19 and Dietary Health |
| Authors | |
| Affiliation | INFO-201: Technical Foundations of Informatics - The Information School - University of Washington |
| Date | February 18th, 2022 |
| Abstract | This project uses global food and COVID-19 data to explore the relationship between diet and COVID-19 mortality. Correlations between these two factors could indicate that (1) healthy diets improve COVID-19 outcomes, (2) countries with enough food are better equipped to combat COVID, or (3) a combination of both. |
| Keywords | dietary health, COVID-19, public health, food |
| 1.0 Introduction | Over the last two years, the COVID-19 pandemic has swept across world. During this time, scientists, politicians, and world leaders have been trying to find a way to return to normal. By finding correlations between dietary data and COVID-19 statistics, we hope to gain valuable information on how diet effects COVID-19 mortality. More specifically, we will explore correlations between COVID-19 mortality and nations macronutrient consumption through the COVID-19 Healthy Diet Dataset. By understanding these data, we can better understand dietary health’s relationship with our immune systems. |
| 2.0 Design Situation | |
| 3.0 Research questions | |
| 4.0 The Dataset | This data set, Food Supply Kcal [(Marila Prata, 2020)] (https://www.kaggle.com/mpwolke/food-supply-kcal/data), represents the global population affected by COVID-19. With this, the data set also accounts for the food supply, nutrition values, obesity percentages, malnourishment percentages, and food habits of the countries represented. Among these broad categories the data set exhibits, variables such as COVID-19 deaths, active cases, and recovery cases are also analyzed. Having these variables allows the correlation between global food habits, and global COVID-19 cases more easily understandable, and also puts the correlation into perspective for the audience of the data. The set excludes the variables race and gender. While excluding these do not change the validity of the data, nor do they compromise the purpose, adding variables such as race and gender, could illustrate any disproportionate infection rates based on these two factors, and if they are affected by different diets. This could also show any commonalities between men’s and women’s diets and which gender tends to have a higher infection rate. The data was amassed by Kaggle user Marila Prata, who collected information from sources like, the Food and Agriculture Organization of The United Nations (http://www.fao.org/faostat/en/#home) , the Population Reference Bureau (https://www.prb.org/) , the Johns Hopkins Center for Systems Science and Engineering (https://coronavirus.jhu.edu/map.html) , and the USDA center for Nutrition (https://www.choosemyplate.gov/) . The data was originally collected in the beginning of the COVID-19 pandemic and has been updated since, yet ceased updates in April of 2020. Collecting this data was an attempt at answering the question: what non-pharmaceutical interventions could the population make in order to stop the spread of COVID-19? No funding was involved to collect the data, as the majority of the information is publicly available. Alternatively, those who could benefit, or financially benefit from a data set of this nature would be direct stakeholders such as large healthcare organizations and the government, specifically food and agricultural divisions. The validity of this data is strong. Many of the resources cited come from official United States government websites, and a reputable source, John Hopkins. Besides COVID-19 information collected from Johns Hopkins, there was no data included in the set from a source outside of official, public information. This makes the data easier to trust as it is assumed information that originates from the United States government is honest, and not exclusionary. This data was obtained from Kaggle, an online platform used to publish and find datasets. The source of the data is credited as kaggle, and the aforementioned sources are factual and verifiable. |
| 5.0 Expected Implications | Answers to research questions can have a positive impact on all fronts. For professional technologists, the results of the research will allow them to have a more mature and comprehensive concept of nutritional research and they can adjust nutritional balance according to this concept. For designers, more concepts about the nutritional content of food allow them to have a more comprehensive theoretical knowledge to think about and apply in practice. Finally, for policymakers, the results of nutritional data will allow them to set more rules and requirements for the food industry to ensure people’s health for the sake of people’s health. Therefore, during COIVD-19 these actions can boost their immunity through nutritional balance so that people have a stronger body to resist the coronavirus. And during a pandemic, the issue of adequate nutritional balance is critical to combating the new coronavirus, and that’s something policymakers will consider. |
| 6.0 Limitations | Possible limitations that we might need to consider include the limited time frame of this data. This data stopped being updated in April 2020, so it really only encompasses the very beginning of the Covid-19 Pandemic which makes it difficult to be certain that nourishment played the largest role in these Covid statistics. Additionally, we miss out on how nourishment played a role in the later spread of the disease as new variants emerged. Furthermore, we have to take into account the different collection biases when comparing countries’ Covid-19 mortality rates. Geographical variations in the Covid-19 strains are another consideration, as the strain of Covid in America is not the same as in Europe and likely has different effects. Lastly, we have to consider how different cultures view food and how limited of a commodity it is in some nations, and how that might affect the data we are analyzing. |
| Acknowledgements | I’d like to appreciate our TA, professor, and my team members because it is through them that we could understand more about data. |
| References | |
| Appendix A: Questions | No questions so far! Thanks for asking :smile: |
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Our data set included food consumption and COVID-19 statistics from countries representing people. In these data, we are able to analyze how diet relates to COVID-19 outcomes, and what that implies about population health. Countries combating COVID-19 have median mortality rate of 0.012 and a average mortality rate of 0.018. This indicates that our death rate distribution skews left. As for obesity, countries experience obesity at a median rate of 21.2% and an average rate of 18.7%. This obesity rate distribution skews right. For a better look at our data, here is a summary table consisting of food and COVID-19 data for the 15 largest countries:
| Country | Population | Cases | Deaths | Fatality Ratio (Deaths to Cases) | Obesity (%) | Malnourished (%) | Animal Producs (kg) | Vegetable Products (kg) |
|---|---|---|---|---|---|---|---|---|
| China | 1398030000 | 125667 | 4857 | 0.039 | 6.6 | 8.5 | 13.424 | 36.574 |
| India | 1391885000 | 42692943 | 509358 | 0.012 | 3.8 | 14.5 | 11.336 | 38.657 |
| United States of America | 329153000 | 77918466 | 922470 | 0.012 | 37.3 | <2.5 | 21.235 | 28.759 |
| Indonesia | 268419000 | 4807778 | 145176 | 0.030 | 6.9 | 8.3 | 6.258 | 43.744 |
| Pakistan | 216565000 | 1488958 | 29828 | 0.020 | 7.8 | 20.3 | 22.276 | 27.723 |
| Brazil | 209332000 | 27552267 | 639151 | 0.023 | 22.3 | <2.5 | 17.347 | 32.654 |
| Nigeria | 200964000 | 254016 | 3141 | 0.012 | 7.8 | 13.4 | 1.739 | 48.258 |
| Bangladesh | 163667000 | 1914356 | 28838 | 0.015 | 3.4 | 14.7 | 5.193 | 44.803 |
| Russia | 146731000 | 14102736 | 334093 | 0.024 | 25.7 | <2.5 | 16.152 | 33.847 |
| Mexico | 126577000 | 5292706 | 312819 | 0.059 | 28.4 | 3.6 | 15.153 | 34.846 |
| Japan | 126180000 | 3975513 | 20516 | 0.005 | 4.4 | <2.5 | 15.319 | 34.678 |
| Ethiopia | 112079000 | 467575 | 7426 | 0.016 | 3.6 | 20.6 | 5.296 | 44.699 |
| Philippines | 108117000 | 3639942 | 55094 | 0.015 | 6.0 | 13.3 | 6.711 | 43.291 |
| Egypt | 99064000 | 457081 | 23409 | 0.051 | 31.1 | 4.5 | 6.737 | 43.262 |
| Vietnam | 95656000 | 2540273 | 39037 | 0.015 | 2.1 | 9.3 | 8.576 | 41.423 |
These data reveal that populations with higher COVID-19 mortality rates tend to have higher obesity rates (>20%). The exception to this rule is the United States which has a 37.3% obesity rate, and a 0.012 COVID-19 mortality ratio.
#Libraries used
library("dplyr")
library("ggplot2")
#Importing and reading the data files with the global food data (in kilograms), and global COVID data
global_food_data <- read.csv("../data/global_food_and_covid.csv")
#Grouping Food Supply data set by food category(sweeteners)
food_supply_quantity_kg_data <- global_food_data %>%
group_by(Sugar...Sweeteners) %>%
filter(Last_Update == max(Last_Update, na.rm = TRUE))
#creating a density map exemplifying the correlation between the percentage
#cd ~of global sugar consumption and the percentages of deaths from COVID
food_supply_quantity_kg_data %>%
ggplot(., aes(x = Sugar...Sweeteners, fill = country_deaths)) +
geom_density()
This density map explores the relationship between global sugar/ sweetener consumption and global Covid-19 deaths. This chart visualizes the ratio of death percentages pertaining to the percentage of sugar consumption
Observable in the map, there is a clear relationship between percentages of sugar consumption and death percentages.The map exemplifies this by showing that the increase of sweeteners effects the increase in Covid-19 related deaths.
With this observation, we can see that diets with a higher rate in sugar consumption, can effect a nation/persons susceptibility to Covid-19 exposure and death.
<<<<<<< HEAD
#load the package
library(ggplot2)
library(dplyr)
library(leaflet)
library("plotly")
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
#cases and deaths condition in US(scatter plot)
food_and_covid <- read.csv("../data/global_food_and_covid.csv")
covid_specific <- food_and_covid %>%
select(Country_Region, country_deaths, country_cases, country_fatality_ratio)
chart2 <- plot_ly(
data = covid_specific,
x = ~country_cases,
y = ~country_deaths,
size = ~country_deaths,
type="scatter"
) %>% layout(title = "COVID-19 Mortality Ratio",
xaxis = list(range = c(log10(1000), log10(125000000)), title = "Cases", type = "log"),
yaxis = list(title = "Deaths", type = "log")) %>%
add_trace(
text = ~Country_Region,
hoverinfo = c("text"),
showlegend = F
)
chart2
## No scatter mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
## No scatter mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
Considering that the nutritional ratio will have a direct impact on the COVID situation from the perspective of immunity, the purpose of making this graph in the early stage is to count the number of COVID cases and deaths around the world and the ratio between them, so as to compare it with each country. Food nutrient ratios are linked to analyze the impact of nutrition on COVID.
According to the scatter plot, in general, almost all countries have a COVID mortality rate below 10%. For some countries with extremely high mortality, such as Belgium(58% death ratio), according to the dataset of Food, the proportion of meat and fruit is far lower than that of countries with low mortality, so it seems that the reason behind this may be because insufficient intake of protein and vitamins leads to decreased immunity.
food_global_data <- read.csv("../data/Food_Supply_Quantity_kg_Data.csv")
obesity_deaths <- food_global_data %>%
select(Obesity, Deaths)
ggplot(obesity_deaths, aes(x=Obesity, fill = Deaths)) + geom_histogram()
This histogram explores the relationship between the percentage of global obesity data and the percentages of deaths from COVID-19. Additionally, this chart visualizes the ratio of death percentages compared to the percentage of global obesity.
While analyzing the histogram, there seems to be distinct relationship between the obesity data and the COVID-19 related deaths. The increase in obesity appears to affect the increase in COVID-19 death data.
From these observations, we can see that the higher the obesity data is within a certain region or country, can increase how susceptible a individual is to COVID-19 contraction and death.